Goto

Collaborating Authors

 first-order regret



EfficientFirst-OrderContextualBandits: Prediction,Allocation,andTriangularDiscrimination

Neural Information Processing Systems

On the technical side, we show that the logarithmic loss and an informationtheoretic quantity called thetriangular discriminationplay a fundamental role in obtaining first-order guarantees, and we combine this observation with new refinements tothe regression oracle reduction framework ofFoster and Rakhlin [29].



TightFirst-andSecond-OrderRegretBounds forAdversarialLinearBandits

Neural Information Processing Systems

In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without apriori knowledge about losses, and theyrun efficiently ifalinear optimization oracle for the action set is available.







Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Neural Information Processing Systems

Contextual bandits encompass both the general problem of statistical learning with function approximation (specifically, cost-sensitive classification) and the classical multi-armed bandit problem, yet present algorithmic challenges greater than the sum of both parts.